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Abstract. A one-way hashing algorithm is a deterministic algorithm 
that compresses an arbitrary long message into a value of specified length. 
The output value represents the digest or fingerprint of the message. A 
cryptographically useful property of a one-way hashing algorithm is that 
it is infeasible to find two distinct messages that have the same digest. 
This paper proposes a one-way hashing algorithm called HAVAL. HAVAL 
compresses a message of arbitrary length into a digest of 128, 160, 192, 
224 or 256 bits. In addition, HAVAL has a parameter that controls the 
number of passes a message block (of 1024 bits) is processed. A message 
block can be processed in 3, 4 or 5 passes. By combining output length 
with pass, we can provide fifteen (15) choices for practical applications 
where different levels of security are required. The algorithm is very 
efficient and particularly suited for 32-bit computers which predominate 
the current workstation market. Experiments show that HAVAL is 60% 
faster than MD5 when 3 passes are required, 15% faster than MD5 when 
4 passes are required, and as fast as MD5 when full 5 passes are required. 
It is conjectured that finding two collision messages requires the order of 
2 n ' 2 operations, where n is the number of bits in a digest. 


1 Introduction 

A one-way hashing algorithm is a deterministic algorithm that compresses an 
arbitrarily long message into a value of specified length. The output value rep- 
resents the digest or fingerprint of the input message. A very useful prop- 
erty of a one-way hashing algorithm is that it is collision intractable, i.e., it 
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is computationally infeasible to find a pair of messages that have the same 
digest. One-way hashing algorithms are widely used in information authenti- 
cation, in particular, in digital signature, and have received extensive research 
since the invention of public key cryptography by Diffie and Heilman [DH76] 
and by Merkle [Mer78]. Theoretical results on one-way hashing algorithms were 
obtained by Damgard [Dam87, Dam90]. Results on a weaker version of one- 
way hashing algorithms, universal one-way hashing algorithms, can be found 
in [NY89, ZMI91, Rom90]. 

Recently much progress has been made in the design of practical one-way 
hashing algorithms which are suited for efficient implementation by software. 
Notable work includes the MD family which consists of three algorithms MD2, 
MD4 and MD5 [Kal92, Riv92a, Riv92b], the federal information processing stan- 
dard for secure hash (SHS) proposed by the National Institute of Standards and 
Technology (NIST) of the United States [NIS92], and Schnorr’s hashing algo- 
rithm FFT-Hash based on fast Fourier transformations [Sch92, Vau92] . All these 
algorithms output digests of fixed length. In particular, digests of FFT-Hash and 
the algorithms in the MD family are of 128 bits, while digests of SHS are of 160 
bits which is designed primarily for NIST’s proposed digital signature standard 
DSS [NIS91], 

Despite the progress, little work has been done in the design of one-way 
hashing algorithms that can output digests of variable length. Such an algo- 
rithm would be more flexible and hence more suited for various applications 
where variable length digests are required. The aim of this research is to de- 
sign a one-way hashing algorithm that can output digests of 128, 160, 192, 224 
or 256 bits. These different lengths for digests provide practical applications 
with a broad spectrum of choices. The algorithm, which we call HAVAL, uses 
some of the principles behind the design of the MD family. In addition, HAVAL 
makes an elegant use of Boolean functions recently discovered by Seberry and 
Zhang [SZ92] . These functions have nice properties which include 

1. they are 0-1 balanced, 

2. they are highly non-linear, 

3. they satisfy the Strict Avalanche Criterion (SAC), 

4. they can not be transformed into one another by applying linear transfor- 
mation to the input coordinates and 

5. they are not mutually correlated via linear functions or via bias in output. 

In addition, the number of passes each 1024-bit block of an input message is 
processed can be 3, 4 or 5. This adds one more dimension of flexibility to the 
algorithm. Combination of the two variable parameters, pass and output length, 
provides practical applications with fifteen different levels of security. 

When compared with MD2, MD4, SHS and FFT-Hash, MD5 is considered 
much superior in terms of speed and security. In particular, MD5 is about 15% 
faster than SHS (See for example the note posted on the sci . crypt news group 
by Kevin McCurley, 5 September 1992), although the latter is very likely to 
become a standard. Our preliminary experiments show that HAVAL is at least 



60% faster than MD5 when 3 passes are required, at least 15% faster than MD5 
when 4 passes are required, and about as fast as MD5 when full 5 passes are 
required. 

Detailed specifications of HAVAL are presented in Section 2. Section 3 dis- 
cusses rationale behind the design of HAVAL. This is followed by a discussion 
about security issues of HAVAL in Section 4. Extensions of HAVAL in several 
directions are discussed in Section 5. Finally, Section 6 presents some concluding 
remarks. 


2 Specifications of HAVAL 

We begin with a general description of the algorithm. Detailed specifications of 
all parts of the algorithm follows. 

First we introduce a few notations and conventions. We consider, unless oth- 
erwise specified, strings (or sequences) on GF( 2). Throughout the paper, a single 
bit from GF( 2) will be denoted by a lower case letter, while a string of bits on 
GF( 2) will be denoted by a upper case letter. A byte is a string of 8 bits, a word 
is a string of 4 bytes (32 bits) and a block is the concatenation of 32 words (1024 
bits). We assume that the most significant bit of a byte appears at the left end 
of the byte. Similarly we assume that the most significant byte of a word comes 
at the left end of the word, and the most significant word of a block appears at 
the left end of the block. Note that a binary string X = x n -\X n - 2 • ■ • xq can be 
viewed as an integer whose value is lx = x n - 1 2 n ~ 1 +x n - 2 2 n - 2 + - ■ - + x 0 2 0 . Con- 
versely an integer / can also be viewed as a binary string Xj = x n -iX n - 2 • • • xq 
with / = x n - 1 2 n ~ 1 + x„- 2 2 n ~ 2 H b x 0 2°. 

The modulo 2 multiplication and modulo 2 addition of Xi,x 2 £ GF(2) are 
denoted by xix 2 and xi®x 2 respectively. The bit-wise modulo 2 addition opera- 
tion of two binary strings Si and S 2 of the same length is denoted by S±®S 2 , and 
the bit-wise modulo 2 multiplication of the two strings S i and S 2 is denoted by 
S-i • S 2 . Note that • has precedence over ® in computation. Another notation e 
is also used in the specifications. Assume that Si = • • • W- t q and 

S 2 = W 2i n~iW 2tn -2 ■ ■ ■ W' 2 , 0 ; where each W h j is a 32-bit word, the word-wise in- 
teger addition modulo 2 32 of the two strings is denoted by Si s S 2 , i.e., Si b £2 = 
( Wi, n -i+W 2 , n -i mod 2 32 )(W 1 , n - 2 +W 2 , n -2 mod 2 32 ) • • • (W h0 +W 2 , 0 mod 2 32 ). 
Note that in the definition of s we have viewed each W l :] as an integer in 
[ 0 , 2 32 - 1 ], 

Given a message M to be compressed, HAVAL pads (extends) M first. The 
length of (i.e., the number of bits in) the message after padding is a multiple 
of 1024, and padding is always applied even when the length of M is already a 
multiple of 1024. The last block of the padded message contains the number of 
bits in the unpadded message, the required number of bits in the digest and the 
number of passes each message block is processed. It also indicates the version 
number of HAVAL. The current version number is 1. 

Now suppose that the padded message is B n _iB n _ 2 ■ ■ ■ B 0 , where each B t 
is a 1024-bit block. HAVAL starts from the block Bq and a 8-worcl (256-bit) 



constant string Dq = q ■ ■ ■ Dofli which is taken from the fraction part of 

7 r = 3.1415..., and processes the message • • • B 0 in a block-by-block 

way. More precisely, it compresses the message by repeatedly calculating 

Di + 1 = H{Di , Bi) 

where i ranges from 0 to n— 1 and H is called the updating algorithm of HAVAL. 
See Section 2.4 for the actual values of the 8 constant 32-bit words D 0 ,r, Z? 0j 6, 
• • 'I -Do.o- 

Finally, HAVAL adjusts, if necessary, the last 256-bit string D n into a string 
of the length specified in the last block I3 n _i, and outputs the adjusted string 
as the digest of the message M. In summary, HAVAL processes a message M in 
the following three steps: 

1. Pad the message M so that its length becomes a multiple of 1024. The last 
(or the most significant) block of the padded message indicates the length 
of the original (unpadded) message M, the required length of the digest of 
M , the number of passes each block is processed and the version number of 
HAVAL. 

2. Calculate repeatedly D i+ 1 = H(D il B j) for i from 0 to n — 1, where D 0 is a 
8-worcl (256-bit) constant string and n is the total number of blocks in the 
padded message. 

3. Adjust the 256-bit value D n obtained in the above calculation according to 
the digest length specified in the last block H„_i, and output the adjusted 
value as the digest of the message M. 

These three steps are described in more detail in the following sections. 


2.1 Padding 

The purpose of padding is two- fold: to make the length of a message be a multiple 
of 1024 and to let the message indicate the length of the original message, the 
required number of bits in the digest, the number of passes and the version 
number of HAVAL. HAVAL uses a 64-bit field MSGLENG to specify the length 
of an unpadded message. Thus messages of up to (2 64 — 1) bits are accepted, 
which is long enough for practical applications. HAVAL also uses a 10-bit field 
DGSTLENG to specify the required number of bits in a digest. In addition 
HAVAL uses a 3-bit field PASS to specify the number of passes each message 
block is processed, and another 3-bit field VERSION to indicate the version 
number of HAVAL. The number of bits in a digest can be 128, 160, 192, 224 and 
256, while the number of passes can be 3, 4 and 5. The current version number 
of HAVAL is 1. 

HAVAL pads a message by appending a single bit 1 next to the most sig- 
nificant bit of the message, followed by zero or more bit Os until the length 
of the (new) message is 944 modulo 1024. Then, HAVAL appends to the mes- 
sage the 3-bit field VERSION, followed by the 3-bit field PASS, the 10-bit field 
DGSTLENG and the 64-bit field MSGLENG. 



2.2 The Updating Algorithm H 


The updating algorithm H processes a block in 3, 4 or 5 passes, which is specified 
by the 3-bit field PASS in the last block. Denote by Hi, H 2 , H 3 , Hi and H 5 the 
five passes. Now suppose that the input to H is ( D. in ,B ), here D ln is a 8-word 
string and B is a 32-worcl (1024-bit) block. Let D out denote the 8-word output 
of H on input ( Di n ,B ). Then processing of H can be described in th following 
way. 


Eq — D, n , 

Ei = Hi(£o,B); 

E 2 = H 2 (Ei,B) | 
e 3 = H 3 (E 2 , B)\ 

E a = H 4 (E 3 , B ); (if PASS=4, 5) 

E 5 = H 5 (E 4 , B)- (if PASS=5) 

( E 3 eb Eq if PASS=3 
Dout = < E 4 B3 E 0 if PASS=4 
E 3 eb Eq if PASS=5 

Each of the five passes Hi, H 2 , H 3 , H 4 and H 5 has 32 rounds of operations 
and each round processes a different word from B. The orders in which the 
words in B are processed differ from pass to pass. In addition, each pass employs 
a different Boolean function to perform bit-wise operations on words. The five 
functions employed by Hi, H 2 , H 3 , H 4 and H 3 are: 


fi(x e ,x 5 ,x 4 ,x 3 ,x 2 ,xi,x 0 ) = x 4 x 4 ® x 2 x 5 ® x 3 xq ® x 0 xi ® x 0 

.f 2 {Xe,X 5 ,X 4 ,X 3 ,X 2 ,Xi,X 0 ) = XiX 2 X 3 ®X 2 X 4 X 3 ®XiX 2 ®XiX 4 ® 

X 2 Xq ® 3:3X5 ffi X4X5 ® x 0 x 2 ® x 0 

fd,{xe, x 3 ,x 4 , x 3 , x 2 , xi,xo) = xix 2 x 3 ® xix 4 ® x 2 x 5 ® x 3 x e ® x 0 x 3 ® x 0 

.f 4 {x 6 ,X 5 ,X 4 ,X 3 ,X 2 ,Xi,X 0 ) = XiX 2 X 3 ® X2X4X5 ® x 3 x 4 x 6 ® 

x 4 x 4 ® x 2 xq ® X3X4 ® X3X5 ® 

X 3 Xq ffi X4X5 ffi X 4 Xe ffi x 3 x 4 ffi Xq 

fb{x&, X 3 ,X 4 , X 3 , X 2 , Xi,Xo) = XiX 4 ffi X 2 X 5 ffi X 3 X 6 ffi X 0 XiX 2 X 3 ffi X 0 X 5 ffi X 0 

These five Boolean functions have very nice properties when their coordinates 
are permuted. This will be stated in Section 3 together with rationale behind 
the design of the functions. The five passes Hi, H 2 , H 3 , H 4 and H 5 are specified 
in more detail in the following sections. 


Pass 1 Assume that the input to Hi is (E 3 ,B), where E 3 consists of 8 words 
Eq'j, Eqq , ■ ■■ , E 0fi and B of 32 words W 34 ,W 30 , • • • , W 0 . H 1 processes the block 
B in a word-by-word way and transforms the input into a 8-word output Ei = 



Original 
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4 

5 

6 

7 
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9 

10 

11 

12 

13 

14 

15 

(Pi) 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

ord2 

5 

14 

26 

18 

IT 

28 

7 

16 

IT 

23 

20 

22 

X 

10 

4 

8 

(P 2 ) 

30 

3 

21 

9 

17 

24 

29 

6 

19 

12 

15 

13 

2 

25 

31 

27 

ord 3 

19 

IT 

4 

20 

28 

17 

8 

22 

29 

Ti 

25 

12 

24 

30 

16 

26 

(Ps) 

31 

15 

7 

3 

1 

0 

18 

27 

13 

6 

21 

10 

23 

11 

5 

2 

ord4 

24 

T 

0 

14 

TT 

7 

28 

23 

26 

IT 

30 

20 

18 

25 

19 

3 

(Pi) 

22 

li 

31 

21 

8 

27 

12 

9 

1 

29 

5 

15 

17 

10 

16 

13 

ord 5 

27 

3 

21 

26 

17 

11 

20 

29 

19 

IT 

12 

7 

13 

8 

31 

10 

(P 5 ) 

5 

9 

14 

30 

18 

6 

28 

24 

2 

23 

16 

22 

4 

1 

25 

15 


Table 1. Word Processing Orders 


permutations 

X6 X4 X3 X2 X\ X() 

03,1 

X\ Xo X3 X5 Xq X2 X4 

03,2 

X4 X2 Xi Xo X5 X3 X6 

03,3 

X6 Xi X2 X3 X4 X5 Xo 

04,1 

X2 X6 Xi X4 X5 X3 Xo 

04,2 

X3 X5 X2 Xo Xi X6 X4 

04,3 

X\ X4 X3 X6 Xo X2 X5 

04,4 

X6 X4 Xo X5 X2 Xi X3 

05,1 

X3 X4 Xi Xo X5 X2 X6 

05,2 

X6 X2 Xi Xo X3 X4 X5 

05,3 

X2 X6 Xo X4 X3 Xi X5 

05,4 

X\ X5 X3 X2 Xo X4 X6 

05,5 

x 2 x 5 x 0 x 6 x 4 X 3 Xi 


Table 2. Permutations on Coordinates 


Ex^Eifi ■ ■ ■ Eifi. Denote by ROT(A, s) the s position rotate right operation on 
a word X and by / o g the composition of two functions / and g ( g is evaluated 
first). Then H\ can be described in the following way. 

1. Let T 0 ,j = E 0 ,i, 0 ^ i ^ 7. 

2. Repeat the following steps for i from 0 to 31: 

f F\ o 03,1(7), 6, Ti t $, Ti t 4, Ti^, Tj ; 2 , Ti t i, I),o) if PASS=3 
P= \ F\ ° 04, i(Ti t 6, Tj ; 5, Ti t 4, Ti^, Ti,2, Ti t i, Tifi) if PASS=4 
[ F\ O 0 5 ,l(T i ,6> Ti, 5) Ti, 4) T%, 3) Ti, 2, Ti,l,Tifi) if PASS=5 

A = ROT(P, 7) ei ROT (T ij7 , 11) e W i; 


F'i+1,7 — Ti, 6i 


T i _ rji m _ m rr\ _ 

1,55 -^2+1,5 2,4 5 -M+1,4 -^2,3? 


Ti 


i+1,3 


— Ti. 2l T 


i,2j lj+1,2 


— Ti.i ; 21- 


*,1) li+1,1 


— 21- n; T 


i,0i -*-i+ 1,0 


= R. 





3. Let Ei t i = T 3 2 ,i, 0 ^ i ^ 7, and output E\ = Tup-Epp ■ ■ ■ Tp 0 . 

Note that the input to the z-th round (Tp, Tp, T, 4 > T, 3 , Tp, Tip, T,o) is 
permuted according to </> 3 p (when PASS=3), (when PASS=4) or <j > sp (when 
PASS=5) before being passed to i 7 ). Here ^ 3 p, ^p and (/> 5 p are permutations 
on coordinates specified in Table 2, where permutations employed by the other 
four passes are also specified. F\ performs bit-wise operations on its input words 
according to the Boolean function /i specified in Section 2.2. 

F 1 (X 6 ,X 5 ,X 4 ,X 3 ,X 2 ,X 1 ,X 0 ) = 

X\ • A4 ® X2 • Xq ® A3 • Xq ® Xq • X\ ® Xq 

The result of F\ is rotated and added (modulo 2 32 ) to the rotated version of 
Tip. The z-tli word Wi in B is also added to the rotated version of Tip. The 
sum is used to substitute (the old) Tip. After the substitution, the 8 words 
Tip, T,6» • • • , Tp are shifted with Tp being replaced by Tp, Tp by Tp, . . ., 
Tip by Tpo, and Tp by Tp. These words are then used as input to the (z + l)-th 
round. Finally, T 32 is output as a result. 


Pass 2 Assume that the input to H 2 is (E - t , B). H 2 processes the words in B 
according to the word processing order ord 2 specified in Table 1. It employs in 
its computation 32 constant words A' 2 , 31 , K 2 i 3 o, ■ ■ ■ , A" 2 p, all of which are taken 
from the fraction part of 7 r. The actual values of these constant words are defined 
in Section 2.4. H 2 processes the words as follows: 


1. Let Top = Tip, 0 ^ z ^ 7. 

2. Repeat the following steps for i from 0 to 31: 

F 2 0 4>3,2{Ti t Q, Tjp, Tjp, Tp, Tjp, Tjp, 

P = ( F 2 0 <^>4,2 (Tjp, Tjp, Tjp, Tjp, Tjp, Tjp, 

F 2 O 05p(T,6, T,5, T,4, T, 3, T, 2, Tip, 


Tj j0 ) if PASS=3 
Tp) if PASS=4 
Tp) if PASS=5 


T+1,7 — Tp; T+1,6 — T, 5 ; 

7T f~n . m 


T)pl ,3 — Tp, T+ip -£jp, ^t+ip -*■ zp , 

3. Let E 2 ,i = T 32 p, 0 ^ z ^ 7, and output E 2 = TzpAjp ■ ■ ■ E 2 ,q. 


— Tp; Tp ip — T,o; Tpi,o — R- 


Similar to H 1 , (Tp,Tp,T,4,T,3,Tp,Tp,Tp) is permuted according to 
03 , 2 , 4 >a ,2 or 4 > sp before being passed to T 2 , where 4> 3 p, ^ 4 p and <(> 5 p are specified 
in Table 2. F 2 performs bit-wise operations on its 7 input words according to 
the Boolean function / 2 : 

F 2 (A 6 ,A5,A4,A 3 ,A 2 ,A 1 ,Ao) = 

X\ • a 2 • X 3 ffi A 2 • X4 • A5 ® 

X\ • A 2 ® X\ • A4 ® X 2 • Xq ® X 3 • Xq ® A4 • A5 ® Xq • X 2 ® Xq 


The output value of F 2 is rotated and added to the rotated version of Tp. The 
i - th word W'onipp is also added to the rotated version of Tp. In addition, a 



constant Kij which is unique to i is added to the rotated version of Ti. 7 . As in 
H 1 , the 8 words are shifted before proceeding to the next round of operations. 
The output of H 2 is the result of the last round. 


Pass 3 The input to H 3 is ( E 2 ,B ). H 3 processes the words in the block B 
according to the word processing order for ord 3 specified in Table 1. H 3 also 
employs 32 constant words -£^ 3 , 31 , i+ 30 , ■■■, K 3 q, all of which are taken from 
the fraction part of 7 r. 


1. Let Toy = £' 2 ,jj 0 A i A 7. 

2. Repeat the following steps for i from 0 to 31: 

( -£3 0 (f>3,3{Ti t Q, + 5 , + 4 , + 3 , Tip, Ti,i, Ti o) if PASS=3 
P= \ P 3 ° <+3(+,6j Tifi, +4> Tifi, Tifi, Tifi, Tifi) if PASS=4 
[ F 3 o </<5,3(Tj j 6, T», 5 , Ti t 4 , Ti } 3 , Ti } 2 , Tip, T^o) if PASS=5 


R = ROT (P, 7) ROT (T i)7 , 11) m lP ord3(i) m K 3 y, 


T+ 1,7 = Tifi-, Ti + \fi = 

T^, Ti + ifi — 7 

2,4} ^i+1,4 — 

Tifi] 

Ti+ 1,3 = Tifi-, T i+ ifi = 

Tifi] T i+1 fi = 1 

2,0? ^2+1,0 

R. 

3. Let E 3i = T 32 i , 0 ^ ^ 7, 

and output E 3 = E 3 fiE 3 Q 

- - - E 3 ,o 


F 3 performs bit-wise operations according to the Boolean function f 3 : 

F 3 (X 6 ,X 5 ,X 4 ,X 3 , a 2 ,a 1 ,a 0 ) = 

Xi • Xi • A3 ® X\ • X 4 ® Xi • X 3 ® X 3 • Ag ® Xq • X 3 ® X 3 


Pass 4 This pass is executed only when four or five passes are required. The in- 
put to H 4 is (E 3 , B) . The order in which the words in the block B are processed is 
specified by ord 4 in Table 1. 32 constant words, denoted by A' 4 , 31 , A+O; ■ ■ ■ , A^g, 
are employed by H 4 . These constants are unique to H 4 and all taken from the 
fraction part of n. 


1. Let T 0 ,i — ^ 3,11 0 = * = 7. 

2. Repeat the following steps for i from 0 to 31: 

p_ f A 4 ° + 4(Tj j6 , T i>5 , T it 4,T it3 , 

\ F4 ° ^ 5 , 4 (T i)6 ,T il 5 ,T ij 4 ,T il3 ,T il 2 ,T il i l 


Tifi) if PASS=4 
T-fi) if PASS=5 


R = ROT (P, 7) ROT (T i>7 , 11) a W Qrdi(i) a L+; 

rri rji _ rri rri _ rri rri _ rri m _ 

-*2+1,7 — -*2,6? -* 2+ 1,6 — ^2,5? -*2+1,5 — -*i, 4? -*2+1,4 — -*2,3? 

T ' 'T 1 • 'T' 'T' ■ 'T' 'T' • r T E? 

2+1,3 — ^2,2? -*2+1,2 ~ -*2,1? -*2+1,1 — -*2,0? 2+1,0 — -**• 

3. Let A 4> ,; = 0 ^ i ^ 7, and output E 4 = A^A^g ■ ■ ■ E 4j 


F 4 performs bit-wise operations on its input words according to the Boolean 
function / 4 : 

f 4 (X 6 ,x 5 ,a 4 ,x 3 ,x 2 ,X U X 0 ) = 

X\ • A2 • X 3 ® X 2 • X4 • A5 ® X 3 • A 4 • X 3 ® 

Xi • X4 ® A2 • Xq ® A3 • A 4 ® A3 • A5 ® 

A3 • Xq ® A 4 • A5 ® A 4 • Ag ® Aq • A 4 ® Ag 



Pass 5 This pass is executed only when five passes are required. The input 
to H . 5 is (E 4 ,B). The order in which the words in the block B are processed is 
specified by ord .5 in Table 1. The 32 constant words employed by H- } are denoted 
by ^5,315-^5,30) • • • > 7 + 0 - 


1. Let T 0ji = Ei t i, 0 ^ ^ 7. 

2. Repeat the following steps for i from 0 to 31: 

P = F 5 o 05 i6 (T j>6 , +5, +4, Ti t s, Tj ; 2) Ti,i, Tifi); 

R = ROT(P, 7 ) eb ROT(Tj i 7, 11) a W ord6(i) a P+; 

T t rj~i rji rji rj~i rri rj~\ rj~i 

2+1,7 — J-i+ 1,6 — J-i , 55 -*i+ 1,5 — L-i, 45 -*2+1,4 — -*2,35 

2+1,3 — -Li, 2, -*2+1,2 — -*2,1} -L 2+1,1 — -L i,0 1 -*2+1,0 — LX. 

3. Let E 5>i = T 32 y, 0 ^ i ^ 7, and output P 5 = T 5 , 7 ^ 5,6 • • • T+o- 


T 5 performs bit-wise operations on its input words according to the Boolean 
function f$: 

F 5 (X 6 ,X 5 ,X 4 ,X 3 ,X 2 ,X 1 ,X 0 ) = 

X\ • X4 © X2 • X5 © X3 • Xq © Xq • X\ • X2 • X 3 © Xq • .X5 © Xq 


2.3 Tailoring the Last Output of H 


Recall that the last string D n = D n 7 D n Q ■ ■ • Z + 0 output by H is of 256 bits. 
D n is used directly as the digest of M if a 256-bit digest is required. Otherwise, 
D n is tailored into a string of specified length. We discuss the four cases that 
need adjustment to D n . These four cases are (1) Case-1 when 128-bit digests are 
required, (2) Case-2 when 160-bit digests are required, (3) Case-3 when 192-bit 
digests are required and (4) Case-4 when 224-bit digests are required. In the 
following discussions, we will use a superscript to indicate the length of a string. 
For instance, if X is a 2-bit string, we use to indicate explicitly the length 
of X. 

Case-1 (128-bit digest): We divide T+ 7 , 7+ 6 , 7+5 and 7 + 4 in the following 
way 


7+7 

7^n,6 

77n,5 

77n,4 


y[8] y[8] r [8] y[8] 

^7,3^7,2^7,1^7,0> 
v[S] v[8] y[8] y[8] 

-^6,3^6,2^6,1^6,01 

y[8] y[8] y[8] y[8] 

^5,3^5,2^5,1^5,01 

y[ 8 ] y[ 8 ] y[ 8 ] y[ 8 ] 

-W, 3^4, 2^4, 1^4,0’ 


The 128-bit digest is Y 3 Y 2 YiY 0 , where 



Case-2 (160-bit digest): We divide D n , 7, and D n , 5 in the following way 


D n , 7 
6 

D n ,5 


_ vl 7 l v(6] w[7] v[6] 

— ^-7 ; 4^-7 i 3^-7 ; 2^7 1 i 

_ y[7] v[6] y[7] y[6] 

— ^6, 4^6, 3^-6, 2^6,1 

— Vi 7 ! Y^ Vi 7 ! vl 6 l 

— 7L54A53A52A51 


X 

X 


[ 6 ] 

7,0’ 

[61 

6 , 0 ’ 

[6] 

5,0' 


Then the 160-bit digest Y 4 Y 3 Y 2 Y 1 Y 0 is obtained by computing 


y 4 = D n , 4 (xQxgxg), 
y 3 = o n , 3 ffl (xgxg4 6 !),, 

^ = 5„, 2 ffl (4!14 6 i4 6 i)’ 

y rj „ ( y[®l Y [®1 y[ 7 ] \ 

7l — -*Ai,l B3 1A 7 ,iA 6 0 A 5 4 J, 

y rj m yPl y[®li 

7q — 4Ai, 0 B3 1 -^7, 0-^6, 4^5, 3b 


Case-3 (192-bit digest): Divide D n , 7 and D n j6 into 

1 1 v [ ,r| i v bl yI ®1 v [ ,r 'l v f ,r 'l 

iy ra, 7 — ^ v 7, 5^7, 4^7, 3^7, 2^7, 1^7,0 


7,0’ 

/ 1 y [6] y-[5] y[5] y[6] v! 5 ! y[®1 

-^n,6 — Ag 5 Ag 4 ^\g 3 v\g 2 Ag Ag q. 


Let 


y 5 = D n> 5 ffl (4 6 4 m)’ 

y 4 = ^n,4 ffl (4*144’ 
*3 = C n ,3 H3 (4 5 i4 6 2)’ 
>2 = r>„, 2 a (4M8)> 

ii = i>„,i ffl (4!l4!o)’ 
>0 = £>n,0 H3 (4 5 i4 6 s)' 


Output Y 5 Y 4 Y 3 Y 2 Y 1 do as the digest. 

Case-4 (224-bit digest): We divide D nj 7 into 


Ai,7 


- y ! 5 1 y^ yW y^ yW y^ 

— a 7, 6^7, 5^7, 4^7, 3^7, 2^7,1 


X 


[4] 

7,0' 


The 224-bit digest is dg Y 4 Y :i Y 2 Y[ Yq , where 


Yq — D h q ia 4,0’ 
F 5 = Dn , 5 ffl 44 
d4 = D n , 4 EB 44 

r 3 = D n ,s ffl 44 



y 2 = Dn , 2 ffl ifi, 

Vl = Dn , i ffl 4% 

y 0 = D n ,o EB 41 

2.4 The Constants from n 

HAVAL uses totally 136 constant 32-bit words. Among them, 8 words are used 
as initial values Do, 7 , D 0 ,6, • • £>o,o 5 32 words are employed by Pass 2 as A' 2j34 , 

-A 2j30 , ■ ■ •> and AT 2i0 , 32 words by Pass 3 as A" 3 j3 i, A" 3j30 , • • •, and K 3 ,o, 32 words 
by Pass 4 as A' 4 3] , AT 4j30 , • • and AT 4i0 , and the remaining 32 words by Pass 5 
as A" 5 j3 i, A' 5 )3 o, •••, and A' 5i0 . The first 8 constant words correspond to the 
first 256 bits of the fraction part of n. The 32 constant words used in Pass 2 
correspond to the next 1024 bits of the fraction part of n, which is followed by 
the 32 constant words used by Pass 3, the 32 constant words used by Pass 4 and 
the 32 constant words used by Pass 5. The 136 constant words are listed in the 
following in hexadecimal form. They appear in the following order: 

1- D 0,7, .Do, 6) • • ’, A), 0 5 

2- A' 2 , 3 i, A" 2 , 3 o, • • •, A" 2j0 , 

3- A' 3i34 , A' 3j30 , • • -, A' 3i0 , 

4- A' 4i34 , K 43 Q, • • •, AA.o, 

5- A^5, 3 l , A" 5j30 , • • -, A'5,0- 

243F6A88 85A308D3 13198A2E 03707344 A4093822 299F31D0 082EFA98 EC4E6C89 

452821E6 38D01377 BE5466CF 34E90C6C C0AC29B7 C97C50DD 3F84D5B5 B5470917 

9216D5D9 8979FB1B D1310BA6 98DFB5AC 2FFD72DB D01ADFB7 B8E1AFED 6A267E96 

BA7C9045 F12C7F99 24A19947 B3916CF7 0801F2E2 858EFC16 636920D8 71574E69 

A458FEA3 F4933D7E 0D95748F 728EB658 718BCD58 82154AEE 7B54A41D C25A59B5 

9C30D539 2AF26013 C5D1B023 286085F0 CA417918 B8DB38EF 8E79DCB0 603A180E 

6C9E0E8B B01E8A3E D71577C1 BD314B27 78AF2FDA 55605C60 E65525F3 AA55AB94 

57489862 63E81440 55CA396A 2AAB10B6 B4CC5C34 1141E8CE A15486AF 7C72E993 

B3EE1411 636FBC2A 2BA9C55D 741831F6 CE5C3E16 9B87931E AFD6BA33 6C24CF5C 

7A325381 28958677 3B8F4898 6B4BB9AF C4BFE81B 66282193 61D809CC FB21A991 

487CAC60 5DEC8032 EF845D5D E98575B1 DC262302 EB651B88 23893E81 D396ACC5 

0F6D6FF3 83F44239 2E0B4482 A4842004 69C8F04A 9E1F9B5E 21C66842 F6E96C9A 

670C9C61 ABD388F0 6A51A0D2 D8542F68 960FA728 AB5133A3 6EEF0B6C 137A3BE4 

BA3BF050 7EFB2A98 A1F1651D 39AF0176 66CA593E 82430E88 8CEE8619 456F9FB4 

7D84A5C3 3B8B5EBE E06F75D8 85C12073 401A449F 56C16AA6 4ED3AA62 363F7706 

1BFEDF72 429B023D 37D0D724 D00A1248 DB0FEAD3 49F1C09B 075372C9 80991B7B 

25D479D8 F6E8DEF7 E3FE501A B6794C3B 976CE0BD 04C006BA C1A94FB6 409F60C4 



We generated these constant words by Maple (Version 5 on a SPARCstation) 
with the following program: 

printlevel := -1; 

Digits := 2000; 

pifrac := evalf(Pi) - 3; 

K := 2 ~ 32; 

for i from 1 by 1 while i <= 136 do 
nextword := trunc (pifrac * K) ; 
lprint(convert(nextword,hex)) ; 
pifrac := frac (pifrac * K) ; 

od; 

3 The Design Rationale 

3.1 Designing the Boolean Functions 

The five boolean functions fi, / 2 , f%, fi and / 5 used by Hi, H 2 , H 3 , Hi and 
H 5 are of central importance to the hashing algorithm. We first introduce a few 
definitions before going into their design details. 

Denote by V n the the vector space of n-tuples of elements from GF( 2), 
where n is a positive integer. A Boolean function is a function from V n to 
GF( 2). Note that a Boolean function / from V n to GF(n) can be “reduced” 
to a unique polynomial in n coordinate variables x n , x„_i, . . . , xi. In the fol- 
lowing discussions, we will identify the function f with its unique polynomial 
f(x n , x n -i , . . . , x\). The sequence of the function / is defined as the concate- 
nation of the 2" output bits of f(x n , x n -±, . . . , aq) when x n , x n -i, . . . , X\ vary 
from 0, 0, • • • , 0 to 1, 1, • • • , 1. The function / is called a linear function if / has 
the form of /( x n , x n -i , . . . , aq) = a n x n ® a„_ia; n _i ® • • • ® aqaq ® a o, where 
ai G GF{ 2). 

We say that a function / from V n to GF( 2) is 0-1 balanced if the number of 1 
bits and the number of 0 bits in the sequence of / are the same, both being 2" -1 . 
Let g be another function from V n to GF{ 2). The distance between / and g is the 
number of positions in the sequences of / and g at which the two functions have 
different values. The non-linearity of the function / is defined as the minimum 
distance between / and all linear functions from V n to GF( 2). When n = 2k 
for some k > 1, the maximum non-linearity a function from V n to GF{ 2) can 
attain is 2 2fc_1 — 2 fc_1 . Such a functions is called a bent function [Rot 76]. We say 
that f satisfies the Strict Avalanche Criterion (SAC) if for every 1 = i = n, 
complementing Xi results in the output of / being complemented 50% of the 
time over all possible input vectors. 

Two functions / and g are linearly equivalent (in structure) if f can be 
transformed into g via linear transformation of coordinates and complementation 
of functions, i.e., there is a non-singular n x n matrix A on GF( 2) as well as a 
vector B G V n such that f(xA ® B) = g(x) or f(xA ® B) ® 1 = g(x), where 
x = (x n , x n -i , . . . , aq). Otherwise we say that / and g are linearly inequivalent. 



A set of functions is said linearly inequivalent if all pairs of functions from the 
set are linearly inequivalent. 

/ and g are mutually output-uncorrelated if /, g and f®g are all 0-1 balanced 
non-linear functions. A set of functions is mutually output-uncorrelated if all 
pairs of functions in the set are mutually output-uncorrelated. The set is said 
perfectly output-uncorrelated if any non-zero linear combination of the functions 
in the set results in a 0-1 balanced non-linear function. 

Linear equivalence and output-correlation can be used to examine from two 
different angles the structural similarity among functions. Our goal is to design 
five Boolean functions in seven variables so that each of the functions has the 
following properties PI, P2 and P3. 

PI Being 0-1 balanced. 

P2 Having a high non-linearity. 

P3 Satisfying the Strict Avalanche Criterion (SAC). 

In addition, as a set of functions, they have the following properties P4 and P5. 

P4 Being linearly inequivalent in structure. 

P5 Being mutually output-uncorrelated. 

These properties are considered as desirable ones for a cryptographic prim- 
itive such as a one-way hashing algorithm. PI ensures that a function outputs 
a 0 bit and a 1 bit with the same probability 0.5 when the input to the func- 
tion is picked randomly and uniformly over all possible vectors. P2 is desirable 
since a linear function would render a cryptographic algorithm easily breakable. 
P3 brings good avalanche effect to a cryptographic algorithm. P4 ensures that 
functions employed by a cryptographic algorithm bears no resemblance in struc- 
ture (with respect to linear transformation of coordinates and complementation 
of functions.) Finally, P5 ensures that the sequences of the functions are not 
mutually correlated either via linear functions or via the bias in output bits. 

In [SZ92], Seberry and Zhang presented a novel method for constructing 
Boolean functions that have the properties PI, P2 and P3. In particular, they 
showed that given a bent function from VA to GF( 2), where k A 1, one can 
obtain a Boolean function from V 2 fc+i to GF( 2) that has the properties PI, 
P2 and P3 and a non-linearity of 2 2k — 2 k . Here is their construction method. 
Let g(x 2 k,X 2 k-h ■ ■ ■ , x{) be a bent function, and let i[x 2 k, X 2 k-u ■ ■ ■ , xi) be an 
arbitrary non-constant linear function. Let 

ll(X2 ki X2k — lt • • • > X \ ) t]{X2ki X2k— 1? * • ■ ) *^l) © ^(^2 kt X2k—1) • • • 5 ^l)- 

Note that h(x 2k,X2k-h>--,Xi) is also a bent function. Also note that a bent 
function is not 0-1 balanced. Now assume that both the function sequence of 
g(x 2 k 5 X 2 k-it ■ ■ ■ , Xi) and that of h{x 2 k,X 2 k-h ■ ■ ■ , x±) have more Is (or Os) than 
Os (or Is). Then the following function from V 2 fc+i to GF( 2) 

f (x 2ki X2k—1> • ■ • 7 X \ , Xq^ 

= (x 0 ® l)g(x 2 k, X 2 k- 1 , ■ ■ • , Xi) © x 0 (h(x 2 k, X2k-1, ■ ■ ■ , Xl) © 1) 

= g(X2k,X2k-l, ■ ■ ■ ,Xi) © X 0 i(X 2 k,X 2 k-l, ■ ■ ■ , Xl) © X 0 



has properties PI, P2 and P3. 

The five Boolean functions /i, f 2 , / 3 , f 4 and /s employed by iJi, H 2 , H 3 , 
H 4 and H , 5 are constructed from the following bent functions < 71 , 32 , 33 and g 4 . 


31(24, 3:5 > 24, 24, x 2 , x\) = X1X4 ® 3424 ® 3424 

32(2:6, x 5 , X4, X 3 , X 2 , X\) = X\X 2 X 3 ® 3:22:40:5 ® 3:12:2 ® 3:13:4 ® X 2 X 6 ® 34X5 ® 3:43:5 
33(2:6, 2:5, X 4 , X 3 , X 2 , Xi) = X 4 X 2 X 3 ® 0:13:4 ® X 2 X 5 ® X 3 X 6 
34(3:6, X 5 , X4, X 3 , X 2 , 3:1) = X\X 2 X 3 ® 2:22:43:5 ® 3:33:43:6 ® 

X1X4 ® X 2 Xq ® 3:33:4 ® 2:33:5 ® X 3 Xq ® 3:43:5 ® X 4 Xe 

These four bent functions were discovered by Rothaus in his pioneering 
work [Rot76]. In the same paper, Rothaus also proved that these are the only 
bent functions from Vq to GF(2) which are linearly inequivalent in structure. 
Let 


£i(x e , x 5 , x 4 , x 3 , x 2 , x 4 ) = xi, 

? 2 (xe, x 5 , x 4 , x 3 , x 2 , xi) = x 2 , 

£ 3 (x 6 , x 5 ,x 4 , x 3 , X 2 ,Xx) = x 3 , 

£ 4 (x 6 , x 5 ,x 4 , x 3 , x 2 ,xi) = x 4 . 

By applying Seberry and Zhang’s method, we obtain the first four functions / 1 , 
f 2 , f 3 and f 4 as follows: 

fi( x 6 ,x 5 ,x 4 ,x 3 ,x 2 ,xi 1 x 0 ) = gi(x 6 ,x 5 ,x 4 ,x 3 ,x 2 ,xi) ® x 0 £i( x 6 ,x 5 ,x 4 ,x 3 ,x 2 ,x 1 ) ® x 0 

= gi(x 6 , x 5 , x 4 , x 3 , x 2 , xi) ® x 0 Xi ® 3; 0 

where i = 1,2, 3, 4. The fifth function, which also has the properties PI, P2 and 
P3, is obtained in the following way. Let 

h 5 (x 6 , x 5 , 3 : 4 , x 3 , x 2 , 3 : 1 ) = gi(x 6 , x 5 , x 4 , x 3 , x 2 , x 4 ) ® x 4 x 2 x 3 ® x 6 . 


Then 


f 5 {x 6 , x 5 , x 4 , x 3 , x 2 , Xi, 3: 0 ) 

= (1 ® 3:0)31(2:6, 2:5, X 4 , X 3 , X 2 , Xi) ® x 0 (l ® h 5 (x 6 ,X 5 ,X4,X 3 ,X2,Xi)) 

= 31(2:6, 2 : 5 , 2 : 4 , x 3 , X 2 , Xi) ® X0X1X2X3 ® X 0 X 5 ® x 0 

These functions have a non-linearity of 2 6 — 2 3 = 56, which is in fact the 
maximum non-linearity of functions from V 7 to GF( 2) [SZ92] . 

Now we show that these functions are linearly inequivalent in structure. We 
call the product of several coordinate variables a term. The degree of a term is 
the number of coordinate variables in it. The degree of a Boolean function is 
the maximum degree among all terms of the function. Thus f\ has five terms 
242:4, 242:5, 2:334, 243:1 and 24. The first four terms are of degree 2, the last 
term is of degree 1, and hence the degree of /1 is 2. Consider the case when a 



linear transformation of coordinates is applied to a Boolean function / and a 
new Boolean function g is obtained. Each term of / generates one or more new 
terms. However no terms that have higher degrees than that of the original one 
can be created. Therefore, all the terms in g which have the highest degree are 
derived from terms in / which have the same degree. This implies that linear 
transformation of coordinates does not change the degree of a function. 

The degrees of the five functions /i, / 2 , f 3 , 04 and f 5 are 2, 3, 3, 3 and 4 
respectively. From the above discussions, we know that / 1 and are linearly 
inequivalent. In addition, neither fi nor 05 can be transformed into any of the 
other three functions 0 2 , / 3 and 04 by linear transformation of coordinates. The 
other direction is also true. Now consider / 2 , and 0 4. Note that / 2 has two 
degree-3 terms X 4 X 2 X 3 and X 2 X 4 X 5 , has one degree-3 term X 4 X 2 X 3 , and f 4 
has three degree-3 terms X 1 X 2 X 3 , X 2 X 4 X 5 and X 3 X 4 XQ ■ It was shown in [Rot76] 
that the above three sets of degree-3 terms can not be transformed into one 
another by linear transformation of coordinates. From this it follows that the 
three functions / 2 , 03 and 04 are linearly inequivalent. In summary fi, / 2 , / 3 , /4 
and /.5 are linearly inequivalent, and hence they have the property P4. 

By now we have seen that the five functions / 1 , / 2 , / 3 , 04 and 05 satisfy 
properties PI, P2, P3 and P4. Verification shows that these five functions do not 
have the property P5. By permuting the coordinates of the functions fi, / 2 and 
/3 according to 03, 1 , 0 3 , 2 and 0 3,3 shown in Table 2, we obtain three functions 
fi 0 03 ,i, f '2 0 03,2 and 03 o (^3 3 that are mutually output-uncorrelated (i.e. , 
satisfying the property P5). In fact these three functions are perfectly output- 
uncorrelated. As permuting coordinates does not affect the functions with respect 
to properties PI, P2, P3 and P4, we know that the three permuted functions 
fi 0 03,i, 02 0 03,2 and 03 o $3 3 which are used in the 3-pass case satisfy all the 
five properties PI, P2, P3, P4 and P5. All non-zero linear combinations of the 
three functions have the maximum non-linearity of 56. 

Similarly, by permuting the coordinates of the functions fi, 0 2 , 03 and 04 
according to 04, 1 , 04 , 2 , 04,3 and 0 4,4 shown in Table 2, we obtain four functions 
0i 0 04 ,i, 02 0 04 , 2 , 03 0 04,3 and 03 o (f >4 3 that are perfectly output-uncorrelated 
and hence satisfy the property P5. Among the non-zero linear combinations of 

01 0 04 ,i, 02 0 04 , 2 , 03 0 04,3 and 04 o (f) 4 , 4 , ten achieve the maximum non-linearity 
of 56 and the remaining 5 achieve 48. 

Permuting the coordinates of the functions fi, 0 2 , 03, 04 and 05 according 
to 05,i, 05,2, 05,4, 05,3 and 05,5 shown in Table 2 yields five functions fi o 05,1, 

02 ° 05 , 2 , 03 ° 05 , 3 , 03 ° 05,4 and 03 o 0 5 , 5 that are mutually output-uncorrelated 
and hence satisfy the property P5. Although the permutations do not yield 
perfectly output-uncorrelated functions, all the non-zero combinations are either 
0-1 balanced or very close to 0-1 balanced. Eight of the combinations have the 
maximum non-linearity of 56, four have 52, fifteen have 48, three have 44 and 
one has 32. 

The permutations shown in Table 2 are obtained by random sampling. We 
have also found many other alternative permutations. The permutations shown 
in Table 2 are chosen since they bring the highest average non-linearity to the 



linear combinations of the functions. 

To compare with MD4, MD5 and SHS, we have listed the Boolean functions 
used by these algorithms in Table 3. The main design criterion for these functions 
is as follows [Riv92a, Riv92b] : if the input to a function is the result of flipping 
independent unbiased coins, then the output of the function should behave in the 
same way as the result of flipping an independent unbiased coin as well. This 
is equivalent to say that the functions are all 0-1 balanced, i.e., they satisfy 
the property PI, one of our five design criteria. Note that one of the functions, 
x ® y ® z, is linear. The other degree-2 functions can be transformed into one 
another by linear transformation on coordinates. In particular, xy ® xz ® yz, 
xz ® yz ® y , and y ® 2 ® xz © 1 can all be transformed into xy ® xz ® 2 by 
(x — > x © y ® 1, y — > y, z — > z), (x — > y, y — > z, z — > x) and (x — > y © z, y — > 
x® z®l,z —> x), respectively. In addition, it is easy to check that correlations 
among the output sequences of the function are very poor. 



MD4 

MD5 

SHS 

T 

xy © xz © z 

xy © xz © 2 

xy © xz © z 

2 

xy © xz(B yz 

xz © yz © y 

x®y®z 

3 " 

x © y ® 2 

x © y © z 

xy © xz(B yz 

4 


y © z © xz © 1 

x®y®z 


Table 3. Boolean Functions Used by MD4, MD5 and SHS 


3.2 Other Design Issues 

At the i-tli round of Pass 1, T it 7 is updated essentially by adding to it the output 
of F\ and the i-tli word IT', . This can be viewed as the folding technique used 
in ordinary hashing (see Page 512, [Knu73]). Rotation is employed to destroy 
the symmetry of addition modulo 2 32 operation. This technique is also used in 
the processing of Passes 2, 3, 4 and 5. Inversion of the updating algorithm H is 
made computationally infeasible by the addition of its 8-word input to the last 
pass’ output. 

Processing of the five passes is made more distinct by allowing them to per- 
form re-ordering operations upon the words. The word processing orders are 
selected in such a way that no word is processed by the same round at different 
passes and that the orders are as un-related as possible. In addition, constant 
words unique to each round are used in the later four passes. These constant 
words have been defined as consecutive bits in the fraction part of 7 r to avoid 
possible allegation that a trap-door would have been planted in them. 

In addition, different permutations on coordinates of /1, f 2 , f 3 , fi and /s are 
employed according to the number of passes required. This makes the hashing 
algorithm behave more differently when the number of passes changes. 




4 Security of HAVAL 


Two messages are said to collide with each other with respect to a one-way hash- 
ing algorithm if they are compressed to the same digest. For HAVAL, there are 
two possibilities for a pair of messages to collide: the number of passes the mes- 
sages are processed can be the same or differ. Ideally, given a one-way hashing 
algorithm, we would like to prove formally that it is computationally infeasible 
to find a collision pair for the hashing algorithm. Like many other hashing al- 
gorithms such as the MD family, SHS and FFT-hash, however, HAVAL could 
not be formally proved to be secure. Recently, Berson has proposed an attack 
to a single pass of MD5 [Ber92]. His method applies to a single pass of HAVAL 
as well. However, it seems that the attack can not be extended to two or more 
passes. 

It is conjectured that the best way to find a collision pair is by using the 
birthday attack. In such an attack, an attacker prepares two sets of 2"/ 2 distinct 
messages, and calculates their digests. Here n denotes the number of bits in 
a digest, and it can be 128, 160, 192, 224, 256. Also note that the number of 
passes the two sets of messages are compressed may differ. The attacker can 
check (by, for instance, sorting) if there is any collision pair of messages, one is 
from the first set and the other from the second set. The attacker will succeed 
with a probability about 0.5. However, such an attack requires the order of 2 ra//2 
operations, which is impractical even for n = 128. It is also conjectured that 
given a digest, it requires the order of 2” operations to obtain a message that is 
mapped to the digest. 


5 Extensions and Future Work 


The algorithm can be extended in several directions. Firstly, we note that the 
number of passes can be increased by adding more functions into the function 
Set {/l,/ 2 ,/3,/4,/5}- 

It is well known that for any k ^ 4, there are at least k linearly inequivalent 
bent functions from V 2 fe to GF( 2). Thus by using the same approach as described 
in Section 3.1, we can design, at least in theory, four or more functions from VA.+i 
to GF( 2) that have the properties PI, P2, P3, P4 and P5. In this way, we can 
design one-way hashing algorithms that compress an arbitrarily long message 
into a digest of 32(2fc + 2) or less bits, where k ^ 4. 

We also note that although HAVAL is designed primarily for 32-bit machines, 
hashing algorithms suited to more advanced platforms such as 64-bit machines 
can be obtained by modifying the definition of a word. 

The efficiency of the algorithm can be improved if we can find simpler replace- 
ments for the five functions. It is a future research subject to search for other 
approaches that might lead to simpler functions having the five properties. 



6 Conclusions 


We have proposed a new one-way hashing algorithm HAVAL that can compress 
an arbitrarily long message into a digest of 128, 160, 192, 224 or 256 bits. To 
meet the needs of various practical applications, HAVAL also has provides the 
flexibility to change the number of passes message blocks are processed. A great 
deal of attention has been paid to the design of the five Boolean functions used 
by the algorithm. We expect that it requires the order of 2 ”/ 2 operations to find 
a pair of collision messages, where n is the length of a digest. We also expect 
that the algorithm would be widely used in practical applications where digests 
of variable length are required. 


7 Acknowledgments 

The authors are grateful to Xian-Mo Zhang for his invaluable contribution to this 
project. This work would be impossible without his insight in the construction 
of cryptographically useful Boolean functions. We also would like to thank Tor 
Nordhagen for his help in testing and programming. 


References 

[Ber92] Thomas A. Berson. Differential cryptanalysis mod 2 32 with applications to 
MD5. In Advances in Cryptology - Proceedings of EuroCrypt’92, Lecture 
Notes in Computer Science. Springer- Verlag, 1992. (to appear). 

[Dam87] I. Damgard. Collision free hash functions and public key signature schemes. 

In Advances in Cryptology - Proceedings of EuroCrypt’87, Lecture Notes in 
Computer Science. Springer- Verlag, 1987. 

[Dam90] I. Damgard. A design principle for hash functions. In G. Brassard, editor, 
Advances in Cryptology - Proceedings of Crypto’89, Lecture Notes in Com- 
puter Science, Vol.435, pages 416-427. Springer- Verlag, 1990. 

[DH76] W. Diffie and M. Heilman. New directions in cryptography. IEEE Transac- 
tions on Information Theory, IT-22(6):472-492, 1976. 

[Kal92] B. Kaliski. The MD2 message digest algorithm, April 1992. Request for 
Comments (RFC) 1319. 

[Knu73] Donald E. Knuth. The Art of Computer Programming, Sorting and Searching, 
volume 3. Addison- Wesley, 1973. 

[Mer78] R. Merkle. Secure communication over insecure channels. Communications 
of the ACM, 21:294-299, 1978. 

[NIS91] NIST. A proposed federal information processing standard for digital signa- 
ture standard (DSS), August 1991. 

[NIS92] NIST. A proposed federal information processing standard for secure hash 
(SHS), January 1992. 

[NY89] M. Naor and M. Yung. Universal one-way hash functions and their crypto- 
graphic applications. In Proceedings of the 21-st ACM Symposium on Theory 
of Computing, pages 33-43, 1989. 

[Riv92a] R. Rivest. The MD4 message digest algorithm, April 1992. Request for 
Comments (RFC) 1320. (Also presented at Crypto’90, 1990). 



[Riv92b] R. R.ivest. The MD5 message digest algorithm, April 1992. Request for 
Comments (RFC) 1321. 

[Rom90] J. Rompel. One-way functions are necessary and sufficient for secure signa- 
tures. In Proceedings of the 22-nd ACM Symposium on Theory of Computing , 
pages 387-394, 1990. 

[Rot76l O. S. Rothaus. On “bent” functions. Journal of Combinatorial Theory (A), 
20:300-305, 1976. 

[Sch92] C. P. Schnorr. FFT-Hash II, efficient cryptographic hashing, April 1992. Pre- 
sented at EuroCrypt’92. 

[SZ92] Jennifer Seberry and Xian-Mo Zhang. Highly nonlinear 0-1 balanced boolean 
functions satisfying strict avalanche criterion, 1992. AusCrypt’92, Gold Coast. 

[Vau92] Serge Vaudenay. FFT-Hash-II is not yet collision-free. In Rump Session, 
Crypto ’92, 1992. 

[ZMI91] Y. Zheng, T. Matsumoto, and H. Imai. Structural properties of one-way hash 
functions. In A. J. Menezes and S. A. Vanstone, editors, Advances in Cryptol- 
ogy - Proceedings of Crypto ’90, Lecture Notes in Computer Science, Vol.537, 
pages 303-311. Springer- Verlag, 1991. 


This article was processed using the RTf^X macro package with LLNCS style 



